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UNIFORM CONVERGENCE OF VAPNIK CHERVONENKIS 
CLASSES UNDER ERGODIC SAMPLING 

By Terrence M. Adams and Andrew B. Nobel 1 

Department of Defense and University of North Carolina at Chapel Hill 

We show that if X is a complete separable metric space and C is 
a countable family of Borel subsets of X with finite VC dimension, 
then, for every stationary ergodic process with values in X, the rel- 
ative frequencies of sets C 6 C converge uniformly to their limiting 
probabilities. Beyond ergodicity, no assumptions arc imposed on the 
sampling process, and no regularity conditions are imposed on the 
elements of C. The result extends existing work of Vapnik and Cher- 
vonenkis, among others, who have studied uniform convergence for 
i.i.d. and strongly mixing processes. Our method of proof is new and 
direct: it does not rely on symmetrization techniques, probability in- 
equalities or mixing conditions. The uniform convergence of relative 
frequencies for VC-major and VC-graph classes of functions under 
ergodic sampling is established as a corollary of the basic result for 
sets. 

1. Introduction. The strong law of large numbers and its extension to 
dependent processes via the ergodic theorem is one of the central results of 
probability theory. The strong law connects sampling and population-based 
quantities, and is one of the basic tools for establishing the consistency of 
statistical inference procedures. Uniform laws of large numbers extend the 
strong law by guaranteeing the uniform convergence of averages to their 
limiting expectations over a given family of functions. Uniform laws of large 
numbers have been widely used and extensively studied in a number of 
fields, including statistics, where they play a foundational role in the theory 
of empirical processes and machine learning. In the latter, they underlie 
many results on consistency and rates of convergence for classification and 
regression procedures. 
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The majority of the work on uniform laws of large numbers to date has 
considered independent, identically distributed samples, although there is 
also a substantial literature concerned with dependent sequences satisfying 
a variety of mixing conditions. The primary focus of this paper is the uniform 
convergence of relative frequencies over a family of sets for general ergodic 
processes. In particular, we show that a sufficient condition for uniform con- 
vergence in the i.i.d. case, namely having finite Vapnik-Chervonenkis (VC) 
dimension, is also sufficient to ensure uniform convergence in the ergodic 
case. The VC dimension is a combinatorial quantity that describes the abil- 
ity of a collection of sets to pick apart finite subsets of points. It can be 
defined without reference to metrics, epsilon-coverings, metric entropies or 
standard notions of vector space dimension. 

Let X = X±, X2, ••• be a stationary sequence of random variables taking 
values in a complete separable metric space X equipped with its associated 
Borel sigma-field S. Under the standard definition, X is ergodic if its invari- 
ant sigma-field is trivial (cf. Definition 6.30 in Breiman [4]). An equivalent, 
mixing-based definition of ergodicity can be formulated as follows. For each 
k > 1, let S k denote the usual product sigma-field on X k . The process X is 
then ergodic if, for each k > 1 and every A,B£S k , 

1 n— 1 

(1) lim - V Pntf € A, Xj+t GB)^ F(X^ € A)F(X$ € B), 

n— >oo n — ' ~ l_ 

i=0 

where X k denotes the /c-tuple (X\,. . . ,X/.). The condition simply states 
that, on average, the present and the future of X become independent as 
the gap between them grows. 

Suppose that X is ergodic. Here, and in what follows, we let X denote a 
random variable independent of X and having the same distribution as X\ . 
For each set C € S, the ergodic theorem ensures that the relative frequency 
m~ l YliLi Ic(Xi) of C converges almost surely to the probability ¥{X € C) 
as m tends to infinity. Of interest here are families of sets over which this 
convergence is uniform. To this end, we define the random variables 



(2) r m (C:X) = sup 

CeC 



1 m 

— y/(X i eC)-P(XeC) 

m ^ — ^ 



m 

i=l 



m > 1. 



A countable family C of Borel measurable sets is said to be a Glivenko- 
Cantelli class for X if the relative frequencies of C £ C converge uniformly 
to their limiting probabilities, in the sense that 

(3) T m (C : X) — > with probability one as m — > 00. 

Note that the uniformity here is over the family C, not the underlying sample 
space; following standard usage, the term "uniform convergence" is used 
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rather than the more traditional "equiconvergence." The assumption that C 
is countable ensures that the supremum in (3) is measurable. Uncountable 
families are discussed briefly below. 

Vapnik and Chervonenkis [24] established necessary and sufficient condi- 
tions for (3) under i.i.d. sampling. Their work provides a connection between 
uniform convergence and the combinatorial complexity of a family C, where 
the latter is measured by the ability of the family to break apart finite sets 
of points. Let C be any collection of subsets of X and let DC A 1 be any 
finite set of points. The shatter coefficient (or index) of C with respect to D 
is defined by 

(4) S(D:C) = \{CnD:CeC}\ 

and is simply the number of distinct subsets of D that can be captured by 
sets CgC. Clearly, S(D:C) < 2l D l . When equality holds, C is said to shatter 
the set D. The result of Vapnik and Chervonenkis can be stated as follows. 

Theorem A (Vapnik and Chervonenkis [24]). If Xi, X 2 , . . . are i.i.d., 
then the uniform strong law (3) holds if and only if 

-logS({X u ...,X n }:C)^0 
n 

in probability as n tends to infinity. 

In subsequent work, Vapnik and Chervonenkis [25] characterized uniform 
convergence for classes of real- valued functions through the related notion of 
metric entropy. Talagrand [22] later provided a characterization of uniform 
convergence in the i.i.d. case that strengthens these results and is focused on 
what happens when uniform convergence fails. For nonatomic distributions, 
his results show that (3) fails to hold if and only if there is a set A € S with 
P(A) > such that, for almost every realization of X, the family C shatters 
the set {X ni ,X n2 , . . .} consisting of those Xi that lie in A. 

Definition. The VC dimension of a family C, denoted here by dim(C), 
is the largest integer k > 1 such that S(D : C) = 2 k for some fc-element subset 
D of X . If, for every k > 1, the family C can shatter some /c-element set, 
then dim(C) = +oo. 

A family C is said to be a VC class if dim(C) is finite. The following 
combinatorial result of Sauer provides polynomial bounds on the shatter 
coefficients of VC classes in terms of their combinatorial dimensions. 

Lemma A (Sauer [20]). //dim(C) = V < oo, then S{D:C) < J2j =0 (7) < 
(m + l) v for every m>V and every D C X of cardinality m. 
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It follows from Lemma A and Theorem A that if V = dim(C) < oo, then 
C is a Glivenko-Cantelli class for every i.i.d. process X. Indeed, one may- 
establish an exponential inequality of the form P(T m (C:X) > t) < ci{m + 
l)^e _C2mi for every t > and m > 1, where c\ and C2 are constants that 
are independent of m, C and the distribution of X (cf. [5]). The notions of 
VC class and VC dimension play a central role in modern central limit and 
empirical process theory; see [6, 23] and the references therein. 

1.1. Principal result. In this paper, we show that the uniform strong law 
(3) holds for VC classes under general ergodic sampling schemes. No mixing 
conditions are imposed beyond ergodicity, and no conditions are imposed on 
the elements of C. Under these circumstances, the convergence guaranteed 
by the ergodic theorem can be arbitrarily slow and we cannot hope to obtain 
distribution-free probability bounds like those discussed above for the i.i.d. 
case. Nevertheless, asymptotic results are still possible. Our principal result 
is the following theorem; its proof can be found in Sections 2 and 3 below. 

Theorem 1. Let X be a complete separable metric space equipped with 
its Borel measurable subsets S and letC^S be any countable family of sets. 
If dim(C) < oo, then, for every stationary ergodic process X = X\,X2, ■ ■ ■ 
taking values in (X, S), 

(5) r m (C:X) = sup 

cec 

as m tends to infinity. In other words, C is a Glivenko-Cantelli class for 
every stationary ergodic process. 

1.2. Uncountable families of sets. The assumption that the family C is 
countable ensures that the suprema T m (C : X) are measurable and is required 
for the construction of the isomorphism in Lemma 6. In addition, countabil- 
ity of C is used in the proof of Proposition 3 to ensure that no sample Xi 
takes values in the boundary of any set CgC. 

Although it can be weakened in many cases (see the discussion below), 
the assumption that C is countable cannot be dropped altogether since it 
excludes somewhat pathological examples that may arise in the dependent 
setting. To illustrate, let [i be a nonatomic measure on (X, S) and let T : X — > 
X be an ergodic //-measure- preserving bijection of X. (More concretely, one 
may take T to be an irrational rotation of the unit circle with its uniform 
measure.) Let T % denote the i-fold composition of T with itself if i > 1, 
the i-fold composition of with itself if i < —1 and the identity if i = 0. 
For each x £ X, let C x = US-oo{^* x i ^ e ^ ne trajectory of x under T and 
define the family C = {C x : x € X}. It is easy to see that for any two points 



1 m 

— Y^I(Xi€C)-W(X€C) 



i=i 



w.p. 1, 
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xi,X2 € X , either C X1 = C X2 or C X1 n C X2 = 0, and so the VC dimension of 
C equals one. Now, let Xi = T^Xq, where Xq € X is distributed according 
to [/,. The process X = Xq,X\, ... is then stationary and ergodic. Moreover, 
the //-measure of the countable set C x is zero for every x and it is easy to 
see that r m (C:X) = 1 with probability one for each m > 1. Thus, (5) fails 
to hold. 

In spite of such negative examples, Theorem 1 can be extended in a 
straightforward way to uncountable classes C under a natural approxima- 
tion condition. We will call an uncountable family CC5 "nice" for a given 
process X if T m {C : X) is measurable for each m > 1 and if, for every e > 0, 
there exists a countable subfamily Cq C C such that lim sup m T m (C : X) < 
lim sup TO r m (Co :X) + e with probability one. If C has finite VC dimension, 
then (5) holds for every ergodic process X such that C is nice for X. 

Theorem 1 can also be extended to the case in which the elements of C 
belong to the completion of the Borel sigma-field of X with respect to the 
common distribution of the Xi . 

1.3. Families of functions. Theorem 1 can be used to establish two re- 
lated uniform convergence results for families of functions. These results are 
presented below. In each case, the results can be extended to uncountable 
families T under approximation conditions like those above for families of 
sets. 

A countable family T of Borel measurable functions / : X — > R is said 
to be a Glivenko-Cantelli class for a stationary ergodic process X if the 
relative frequencies of functions in / converge uniformly to their limiting 
expectations, that is, 




w.p. 1 as m — > oo. 



Here, we assume that the expectation Ef{X) is well defined for each / € T . 
Recall that a measurable function F : X — > [0, oo) is said to be an envelope 
for J- if |/(x)| < F{x) for each x 6 X and / € T . In particular, J- is bounded 
if it has constant envelope F = M < oo. 

1.3.1. VC-major classes. Let Lf(a) = {x:f(x) < a} denote the a level 
set of a function / : X — >■ R. A family of functions T is said to be a VC-major 
class if 

dimyc {J 7 ) = sup dim({L j(a) : / G J 7 }) 

is finite. The following result is established in Section 4.1. 

Proposition 1. Let T be a countable family of Borel measurable func- 
tions f:X—>M. with envelope F. If J 7 is a VC-major class, then (6) holds 
for every stationary ergodic process X such that EF(X) is finite. 
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1.3.2. VC-graph classes. The graph of a function f:X — >W is the set 
Gf C X xl defined by G f = {(x,s) :0 < s < f(x) or f(x) < s < 0}. A family 
J- of functions /: X — t-M is said to be a VC-graph class (Pollard [18]) if 

dim G (JT) = dim({G7:/e F}) 

is finite. The following result is established in Section 4.2. 

Proposition 2. Let J 7 be a bounded, countable family of Borel measur- 
able functions f:X — >■ BL If F is a VC-graph class, then (6) holds for every 
stationary ergodic process X. 

1.4. Related work. Steele [21] used subadditive ergodic theory to estab- 
lish that both T m (C : X) [see (2)] and the entropy n" 1 log S{{X\, . . . , X n } : C) 
[see (4)] converge with probability one to nonnegative constants whenever 
X is ergodic. In addition, he obtained refined necessary and sufficient con- 
ditions for uniform strong laws in the i.i.d. case. Nobel [10] showed that the 
conditions of Theorem A and Talagrand [22] do not characterize uniform 
convergence for ergodic processes and, in particular, that standard random 
entropy conditions do not ensure uniform convergence in the ergodic case. 

Yukich [27] established rates of convergence for r m (J-~:X) when X is 
^-mixing and T satisfies suitable bracketing entropy conditions. Yu [26] ex- 
tends these results to /3-mixing (absolutely regular) processes X and classes 
J- satisfying metric entropy conditions. (See Bradley [3] for more on <j>- and 
/3-mixing conditions.) For VC classes C, the results of Yu imply the uniform 
law (5) when the mixing coefficients f3 k decrease as k~ r for some r > 0. Work 
of Peskir and Yukich [17] extends this conclusion to /3-mixing processes with 
P k = (logk)- 2 . 

Nobel and Dembo [11] showed that one may extend uniform strong laws 
from i.i.d. processes to /3-mixing processes with the same one-dimensional 
marginal distribution. Their result implies that (5) holds for any VC class 
C and any /3-mixing process X. Peligrad [13] established an analogous re- 
sult for processes satisfying a modified ^-mixing condition. Karandikar and 
Vidyasagar [8] extended the results of [11] to families of processes and es- 
tablished rates of convergence depending on the behavior of the mixing 
coefficients. 

Extending earlier work of Hoffmann-j0rgensen [7] in the i.i.d. case, Peskir 
and Weber [16] show that the uniform ergodic theorem (6) holds if and only 
if the family J- is, in their terminology, eventually totally bounded in mean. 
They also note the equivalence of different notions of convergence, as in 
Steele's work. Peskir [15] investigated conditions for uniform mean square 
ergodic theorems for families of weak-sense stationary processes. 

Andrews [1] investigated sufficient conditions under which laws of large 
numbers can be extended from individual functions to classes of functions, 
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with particular emphasis on stochastically equicontinuous classes indexed by 
totally bounded parameter spaces. The bibliography of his paper provides a 
good overview of related work. 

1.5. Overview. In the absence of independence or standard uniform mix- 
ing conditions, a direct approach to Theorem 1 using symmetrization and 
exponential-type inequalities, or a more indirect approach carried out by 
coupling with the independent case, does not appear to be possible. Instead, 
we establish, without reference to independence or mixing conditions, the 
contrapositive of Theorem 1: if the relative frequencies of sets C € C fail to 
converge uniformly, then, for each L > 1, we can find L points x\, . . . ,xl £ X 
that are shattered by C and, consequently, dim(C) = oo. For this, we require 
only the almost sure convergence guaranteed by the ergodic theorem for 
individual sets. Rather than working directly with the shatter coefficients 
S(-:C), we consider joins (partitions) generated by finite subcollections of 
C, which are defined in Section 2 below. 

In the next section, we begin with a special case of Theorem 1 in which 
X = [0, 1] , each Xi is uniformly distributed on X and each element of C is 
equal to a finite union of intervals. This preliminary result, which is the core 
of the paper, is contained in Proposition 3. The general case of Theorem 1 is 
established in Section 3 using Proposition 3 and a series of three reductions. 
The first reduction (contained in Lemma 5) shows that it is enough to con- 
sider processes X for which the marginal distribution of the Xi is nonatomic. 
The second reduction maps the random variables in X and the elements of 
C to the unit interval with Lebesgue measure via a standard measure space 
isomorphism. The final reduction (contained in Lemma 6) makes use of an 
additional measure space isomorphism that maps each element of C into a 
set that is equal, up to a set of measure zero, to a finite union of intervals. 

2. Classes containing finite unions of intervals. In this section, we es- 
tablish a version of Theorem 1 in which X = [0, 1] and each element of C is 
a finite union of intervals. In the proof, we work with the joins of selected 
members of C, which act as surrogates for the more commonly used shatter 
coefficients. 

Definition. The join of k sets A\, . . . , A k C X, denoted J = Vi=i ^ii is 
the collection of all nonempty intersections A\ n • • • (~\Ak, where A{ € {A4, Af\ 
for % = 1, . . . , k. Note that J is a partition of X. The join of A\, . . . , A}, is 
said to be full if it has (maximal) cardinality 2 k . 

The next lemma makes an elementary connection between full joins and 
the VC dimension. A similar result appears in [9] as Lemma 10.3.4. We 
include a short proof here for completeness. 
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Lemma 1. Let C be any collection of subsets of X . If, for some k > 
1, there exists a collection Cq C C of 2 k sets having a full join, then VC- 
dim(C) > k. 

Proof. Indexing the elements of Co in an arbitrary manner by subsets 
of [k] := {1, . . . , k}, we may write C = {C(U) : U C [k]}. For i = 1, . . . , k, let 
Xi be any element of the intersection 



which is nonempty by assumption. For each subset V C [k], it is easy to 
see that Xi € C(V) if and only if i G V. Thus, Co, and hence C, shatters 
{xi, . . .,x k }- 

Now, let ~K = X\,X2, ... be a stationary ergodic process defined on a 
probability space (O, J 7 , P), such that each takes values in [0, 1], is Borel 
measurable and has distribution equal to the Lebesgue measure A(-) on [0, 1]. 



Proposition 3. Let Co be a countable family of subsets of [0,1], each 
of whose elements is a finite union of intervals. Suppose that 

lim sup T m (Cq : X) > 



with positive probability. Then, for each integer L > 1, there exist sets D\, 
D2, ■ ■ ■ , Dl € Co such that the join Kl = D± V D2 V • • • V Dl is full and each 
cell of Kl has positive Lebesgue measure. 

Remarks. It follows from Lemma 1 that the family Co in Proposition 
3 has infinite VC dimension. The additional fact that each cell of the joins 
has positive measure will be needed in the proof of Theorem 1 as we may 
then ignore sets of measure zero that arise in the application of Lemma 6. 
The assumption that C € Co is a finite union of intervals guarantees that 
its boundary has Lebesgue measure zero. Excluding such boundary points 
from the process X plays an important role in the final part of the proof of 
Lemma 3. 

Proof of Proposition 3. In what follows, we will need to examine 
the difference between the relative frequency and probability of subsets of 
the unit interval. To this end, for each w6U, each A C [0, 1] and each m > 1, 
we define 




□ 



Til 



(7) 



& u (A:m)= —'yi(X i (u)eA)-\(A) 

m. — ' 



i=l 
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to be the discrepancy of A with respect to the first m elements of the sam- 
ple sequence Xi(uj). Let B°, B and dB = B\B° denote, respectively, the 
interior, closure and boundary of a set B C [0, 1]. 

For n > 1, let V n = {[k2~ n , (k + l)2" n ] : < k < 2 n - 1} be the set of closed 
dyadic intervals of order n. Let T> be the union of the families T> n and let 
C = Co U T>. Then C and the set Aq = UceC ®C °^ an endpoints of elements 
of C are countable. In particular, X(Aq) = 0. By removing a P-null set of 
outcomes from our sample space, we can, and do, assume that Xi{uf) G Aq 
for every i > 1 and every co G fi. 

Recalling the definition (2), we see that T m (C : X) > T m (Co : X) and, there- 
fore, limsup m T m (C : X) > with positive probability. In particular, there 
exists an i] > and a set E G T with P(J5) > such that 

(8) limsup sup A W (C :m) >rj for each w G E. 

m-too >-CeC 

(Using the results of Steele [21] or, alternatively, the invariance of E, it 
follows that that F(E) = 1, but we do not require this stronger result here.) 
Fix 0<<5<min{7?/12,P(£)}. 

The remainder of the proof proceeds as follows. We first construct a se- 
quence of "splitting sets" R\,R2,--- Q [0,1], in stages, from the sets in C. 
At the kth stage, the splitting set is obtained from a sequential proce- 
dure that makes use of the splitting sets R\, . . . ,Rk-i produced at previous 
stages. Once obtained, the splitting sets are used to identify, for any L > 1, 
a collection of L sets in C that have full join and it is easy to show that at 
most one member of such a collection can come from T>. The final step of the 
proof requires that we keep track of the process by which each splitting set 
Rk is produced; this requirement is reflected in the notation adopted below. 



Construction of R± . We first choose a sequence of sets C\ , C2, • ■ • € C in 
such a way that a significant fraction of the cells in the join of C\, . . . , C n will 
intersect both C n +i and its complement. Let C\ be any set in C. Suppose 
that Ci, . . . , C n G C have already been selected and we wish to choose C n+ \. 
Let J n = V n V C\ V • • • V C n be the join of the previously selected sets and 
the dyadic intervals of order n. Since the process X is ergodic and J n is 
finite, there exists an integer M and a set F with ¥(F) > 1 — 5 such that 

(9) A W (A : m) < 5X(A) for each u G F, A G J n and m > M. 

As 5 < P(-E'), the set EOF has positive P- measure and is therefore nonempty. 
Let ujn+i be any point in E n F. As w n+ i G E, it follows from (8) that there 
exists a set C n+ \ G C and an integer m n+ \ > M such that A^^ 1 (C n+ i : m n+ i) > 
rj. From C n+ i, one may construct the join J n +i = V n+ \ V C\ V • • • V C n+ \ 
and then select C n +2 in the same manner as C n +i. Continuing in this fash- 
ion, we obtain joins J^+i, «/ n +2, • • ■ and sets C n+ 2,C n+ s, . . . G C. We note 
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that the sample points uj n may vary from step to step and that there is no 
requirement that m n +i be greater than m n . 

The choice of the set C n +i ensures that it cannot be well approximated by 
a union of elements of J n or, equivalently, that the collection of cells AG J n 
containing points in C n +i and C^ +1 must have nonvanishing probability. To 
make this idea precise, we define the family 



The next lemma shows that the elements of H n C J n occupy a nonvanishing 
fraction of the unit interval. 

Lemma 2. If G n = {J H n is the union of the sets A € H n , then \{G n ) > 



Proof. Let to = C = C n+ \ and m = m n+ \. By decomposing 

A W (C :m) among the elements of J n , we obtain the following bound: 




77/6. 



rj < A"(C :m)< Y A "( C n A:m) 



AeJ, 



(10) 



= Y A w (CnA:ro) + Y A u (Cn4:m). 



AeH n AeJ n \H, 



By definition of H n , the second term in (10) is at most 



£ f^>4 



AeJ n \H, 



Moreover, the first term in (10) can be bounded as follows: 



Y A u (CnA:m) 



AeH, 




AeH„ i=l 



< Y A"(A:m) + 2A(G n ) 



AeH, 



<(5 + 2)X(G n )<3X(G n ), 
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where the penultimate inequality follows from (9) and the fact that ui n+ i € 
F. Combining the final expressions in the three preceding displays yields the 
result. □ 



Let the sets G n = \jH n , n > 1, be derived from the inductive proce- 
dure described above. For each n > 1, define a sub- probability measure 
X n (B) = X(BnG n ) on ([0,1],$). The collection {A n } is necessarily tight 
and therefore has a subsequence {A nr } that converges weakly to a sub- 
probability v\ on ([0,1], 23), in the sense that J gdX Hr — > gdv\ as r — > oo 
for every (bounded) continuous function g : [0, 1] — > R. It is easy to see that 
v\ is absolutely continuous with respect to A and that 

i/i([0,l])>limsu P A^([0,l])>rc/6. 

r— )-oo 

In particular, the Radon-Nikodym derivative du\/dX is well defined and is 
bounded above by 1. Define R\ = {x : {dv\/dX){x) > 5}. From the previous 
remarks, it follows that 

n n f 1 dv\ 

6 



[11) 



f du\ , f dv\ , f dv-i 



< 



[ ld\+ f 5dX< A(i?i) + <5. 
Jri Jri 



As 5 < r//12 by assumption, we conclude that X(R\) >rj/12 > 0. 

Construction of Rk for k > 2. The splitting sets R2,R3, ■ . ■ are defined 
in order, following the general iterative procedure used to construct R\. 
The critical difference between the first and subsequent stages is that the 
sets . . . , Rk-i produced at stages 1 through k — 1 are included in the 
join used at stage k to define Rk- In what follows, let Ck(n), Jfe(n), Wfc(n), 
mfc(n), -fffc(n) and Gk(n) denote the quantities appearing at the nth step 
of the kth stage. In particular, let C\{n) = C n , n > 1, be the elements of C 
considered in stage 1 and define Ji(n), uj\(n), mi (re), H\{n) and G\(n) in a 
similar fashion. 

Suppose that stages 1 through k — 1 have been completed and that we 
wish to construct the splitting set R^ at stage k. Let Cfc(l) be any element of 
C and suppose that Cfc(2), ... , Cfc(n) have already been selected. We define 
the join 

k— 1 n 

(12) J fc (n) = P n V \/ Rj V \/ Cfc(t). 

j=i i=i 

By the ergodic theorem, there exists an integer M and a set i* 1 with ¥(F) > 
1 — 5 such that (9) holds with J n replaced by Jfc(re). As before, it follows from 
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these inequalities and (8) that there exists a sample point ui k {n + 1) € EnF, 
a set C k (n + 1) sC and an integer m k {n + 1) > M such that 

(13) ^ n+1 \A:m k {n + l)) <5X{A) for each A € J k (n), 
and, simultaneously, 

(14) A^ n+1 )(C fc (n+l):m fc (n+l)) > V . 
Using these quantities, we define the family 

(15) H k {n) = Ue J k (n):A^ n+1 \C k (n + l) nA:m k (n + 1)) > |a(^)| 

and let G k (n) = |J H k (n) be the union of the elements of H k (n). 

Defining J k (n + 1) as in (12) and continuing in the same fashion, we obtain 
a sequence C k {n + 2), C k (n + 3), . . . € C and a corresponding sequence of sets 
G k (n + l),G k (n + 2), . . . C [0, 1]. Lemma 2 ensures that X(G k (n)) > r//6 for 
each n > 1. As before, there is a sequence of integers n k (l) < n k (2) < • • • 
such that the measures X(B n G k {n k (r))) converge weakly as r — > oo to a 
sub-probability measure on ([0,1], 0) that is absolutely continuous with 
respect to A(-). Define R k = {x:{du k /d\){x) > 5}. The argument in (11) 
shows that X(R k ) > ?//12. The arguments below require that we consider 
density points of the splitting sets. With this in mind, for k > 1, let 

R k = L<ZR k :Vm X « X - a > X + a)nRk) =l\ 
{ a^o 2a J 

be the set of Lebesgue points of R k . By standard results on differentiation 
of integrals (cf. Theorem 31.3 of Billingsley [2]), X(R k ) = X(R k ) > rj/12. The 
sets R k are used to construct full joins in the next step of the proof. 

Construction of full joins. Fix an integer L > 2. As the measures of the 
sets R k are bounded away from zero, there exist positive integers k\ < k2 < 
•■•<&£ such that A(P|^ =1 R kj ) > 0. Define the intersections 

L-r 
Qr = P| Rkj 

J'=l 

for r = 0,1, . . . ,L - 1. Note that Q C Qi C • • • C Q L _i. Recall that S°, S 
and 9i? denote, respectively, the interior, closure and boundary of a set 
SC[0,1]. 

Lemma 3. There exist sets D±,D2, ■ ■ ■ ,Dl-i € C suc/i i/tai for each I = 
1, . . . , L — 1, i/ie join A"; = D\ V Z?2 V • • • V D\ satisfies \K\ \ = 2 l and for each 
B € Ki, the intersection B° n Qi is nonempty. In particular, each cell of K\ 
has positive Lebesgue measure. 
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Proof. We establish the result by induction on I, beginning with the 
case 1 = 1. In particular, we show that there exists a set D\ G C such that 
D\ n Q\ and (D\)° n Qi are nonempty. To this end, we choose x\ G Qo, 
which is nonempty by assumption, and let e = 5/2(5 + 1). By definition of 

the sets , there exists ol\ > such that the interval I\ = (x\ — a\, x\ + a\) 
satisfies 

(16) X(h n Qo) > (1 - e)X(h) = 2ai(l - e). 

To simplify notation, let k = Izl- It follows from the last display and the 
definition of R K D Qo that 

(17) MhnR K )= [ < ^dX>5X{I 1 C\R K )>2a 1 {l-e)5. 

JhC\R K « A 

Now, let {n K (r) : r > 1} be the subsequence used to define the sub-probability 
u K . As I\ is an open set, the portmanteau theorem and (17) imply that 

liminf X{h n G K (n K (r))) > v K (h) > v K {h n R K ) > 2«i(l - e)S. 

r— >oo 

Choose r sufficiently large so that A(ii D G K (n K (r))) > 2a\(l — e)5 and 
2~n K {r) ^ Saci/ 4:. We require the following subsidiary lemma. □ 

Lemma 4. There exists a set A € H K (n K (r)) such that AQ I\ and A(^4fl 
Qi) > 0. Moreover, A is contained in Q\. 

Proof. Let G = G K (n K (r)). The choice of n K (r) ensures that 
(1 — e)<5A(7i) < A(/i n G) 

= A(/i n Qi n G) + A(/i n Q\ n G) 
< A(Ji n Qi n G) + A(Ji n Ql) 

<X(I 1 nQ 1 nG) + eX(h), 

where the first inequality follows from our choice of r and the final inequality 
follows from (16) together with the fact that Qo C Q\. The last display and 
the definition of e imply that A(/i D Qi fl G) > 5ct\. As the collection of sets 
used to define J K (n K (r)) includes the dyadic intervals of order n K (r), each 
element A of the join has diameter (and Lebesgue measure) bounded by 
2~n K (r) ^ Sail '4. These last two inequalities imply that 

5ai < X{h n Qi n G) < HQi n A) + 2-|i, 

where the sum is over sets A £ H K (n K (r)) such that AC.Ii. In particular, it 
is clear that the sum is necessarily positive and the first part of the claim 
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follows. Note that A G H K (n K {r)) implies that A G J K (ra K (r)). Thus, the in- 
clusion of the sets R\, . . . ,R K -i in the join ensures that A is contained in 
either or Ri . , but not both, for each j = 1, . . . , L — 1. If A(^4 n Q\) > 0, 
then, necessarily, A fl Qi ^ 0, and the containment relations imply that 
AQQ\. This completes the proof of Lemma 4. □ 

Let D\ = C K (n K (r) + 1) 6 C, where r is the index appearing in Lemma 
4. Recall that Lq is a finite union of intervals and that no random vari- 
ables Xi take values in the finite set dD\. In addition, dD\ has Lebesgue 
measure zero. Let A G H K (n K (r)) be the set identified in Lemma 4 and note 
that X(A) > 0. We argue by contradiction that A (and therefore Q\) has 
nonempty intersection with the interiors of D\ and D\ . Suppose, first, that 
A n D\ = 0. In this case, 

A w (^ nDi:m) = A u (inDJ:m) = 

for every m > 1 and every wGfi. However, as j4 G H K (n K (r)) [see (15)] and 
A(A) > 0, we know that A U (A C\ Df.m) > when w = uj K {n R (r) + 1) and 
m = m K (n K (r) + 1). Thus, we arrive at a contradiction. 

Now, suppose that (Df)° Pi A = 0. In this case, A C Z?i and with the 
choice of U) = bJ K {n K {r) + 1) and m = m K (n K (r) + 1), we have 

|A(A) < A U (A nD 1 :m)=A ul {Ar\D 1 :m) = A"(A : m) < 5X(A). 

Here, the first inequality follows from the fact that A £ H K {n K {r)) and the 
second follows from (13). Comparing the first and last terms above, the fact 
that 5 < 77/T2 again yields a contradiction. We note that the argument above 
applies to any set A G H K (n K (r)) having positive Lebesgue measure. 

Now, suppose that we have identified sets D±, . . . ,Di G C, with I < L — 2, 
such that the join Ki = D± V • • • V Di satisfies the conditions of Lemma 3. 
Let K { = {Bj :j G [2']} and let xj G B° n Qi for each j G [2']. Select a t+1 > 

such that for each j, the interval Ij = (xj — ai + i,Xj + a/ + i) is contained in 
Bj and satisfies 

X(Ij n Qi) > (1 - e)X(Ij) = 2a m (l - e). 

Let k' = /c^_; and let {n K >{l) : I > 1} be the subsequence used to define the 
sub-probability v K i. For each interval Ij, 

liminf X(Ij n G re /(n K /(r))) > ^(Ij) > v K >(Ij n i? K /) > 2a /+1 (l - e)5, 

r— >oo 

where the last inequality follows from the previous display and the fact that 
Qi Q Rk 1 - Choose r sufficiently large so that X(Ij n G K '(n K '(r))) > 2a; + i(l — 
e)5 for each j, and 2~ n ^'^ < 5ai + i/4. 

By applying the proof of Lemma 4 to each interval Ij , it is easy to see that 
there exist sets Aj G H K i(n K >(r)) such that Aj C Ij C i?? A(Aj n > 
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and Aj C Q/+i. Let = C K i{n R i(r) + 1) € C. Arguments identical to 
the case Z = 1 above show that for each j, the intersections Aj n and 

n are nonempty. This completes the inductive step, and hence 

the proof, of Lemma 3. 

Given any two dyadic intervals, they are disjoint, intersect at one point or 
one contains the other. Therefore, among the sets D\, . . . of Lemma 

3, at most one can be a dyadic interval; the remainder are contained in Co 
and together have a full join whose cells have positive Lebesgue measure. 
This completes the proof of Proposition 3. □ 

3. Reductions and proof of Theorem 1. As noted in the Introduction, 
Theorem 1 is derived from Proposition 3 via a series of three reductions. 
Two of these reductions are based on the following lemmas, whose proofs 
can be found in the Appendix. The third follows from standard results on 
measure space isomorphisms. In what follows, A A B = {A \ B) U (B \ A) is 
the standard symmetric difference of two sets. 

Lemma 5. Let X = Xi,X%, ... be a stationary ergodic process taking val- 
ues in (X,S) and let C C S be a countable family of sets such that 
limsup m T m (C : X) > with positive probability. Then X is necessarily un- 
countable and there exists a stationary ergodic process X = Xi, X2, . . . 
with values in (X,S) such that P(Aj = x) = for each x £ X and 
limsup m T m (C : X) > with positive probability. 

Lemma 6. Let C = {C±, C2, • • •} be a countable collection of Borel subsets 
of [0, 1] such that the maximum diameter of the elements of the join J n = 
\/™ =1 Ci tends to zero as n — > 00 . There then exists a Borel measurable map 
(j): [0, 1] — > [0, 1] and a Borel set V\ C [0, 1] of measure one such that: (i) eft 
preserves Lebesgue measure and is one-to-one on V±; (ii) the image V2 = 
4>{V\) and the inverse map <j)~ .V2 — >• Vi are Borel measurable; (hi) (\>~ x 
preserves Lebesgue measure; (iv) for every set C EC, there is a set U(C), 
equal to a finite union of intervals, such that \(<p(C) A U(C)) = 0. 

3.1. Proof of Theorem 1. We establish the contrapositive of Theorem 

1 via a reduction to Proposition 3. Suppose that lim sup m r m (C : X) > 
with positive probability. Let //(•) denote the one-dimensional marginal dis- 
tribution of X. By Lemma 5, we may restrict our attention to the case in 
which //(•) is nonatomic and X is uncountable. It then follows from standard 
measure space isomorphism results [19] that there exist Borel measurable 
sets Xq C X and Lq C [0, 1] with h(Xq) = \(Iq) = 1 and an invertible map 
^■.Xq^-Lq such that ip and ■i/' -1 are measurable with respect to the re- 
stricted sigma-algebras S(1Xq and BDLq, respectively, and n(A) = X(ip(A)) 
for each A £ S Pi Xq. The event E = {Aj € Xq for some i > 1} has probability 
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zero, so by removing E from the underlying sample space, we may assume 
that Xi{uj) G Xq for each sample point ui and each i > 1. 

Define Y t = ift{Xi) for % > 1 and let C\ = {ift(CnX Q ) : C G C} be the (Borel) 
images in [0, 1] of the elements of C. The process Y = Yi,Y%, ... is station- 
ary and ergodic with marginal distribution A. If C\ = ift(C H <-tb) is an ele- 
ment of Ci, then A(Ci) = /x(C n Af ) = /x(C) as /x(AT ) = 1, and I(Yi G C\) = 
I(ip(Xi) G (j)(CriX )) = I(Xi G C) as <j){-) is one-to-one. Moreover, if C\ shat- 
ters points u±, . . . ,Uk G [0, 1], then C shatters ^~ 1 (ni), . . . It follows 
that r m (Ci : Y) = T m (C : X) with probability one (actually, for every co) and 
that dim(Ci) < dim(C). 

Let C2 = C\ U T>, where T> denotes the set of closed dyadic subintervals 
of [0,1]. Then r m (Y:C2) > T m (Y:Ci) and an easy argument shows that 
dim(P) = 2. Using Lemma A (cf. Exercise 4.1 of [5]), one may show that 
dim(C2) < dim(Ci) + dim(P) + 1 < dim(Ci) + 3. As the family C2 includes 
D, it satisfies the conditions of Lemma 6 above: let V\,V2 and (ft: [0,1] — > 
[0, 1] be the associated sets and point mapping, respectively, in the lemma. 
Define Z t = <ft(Y) for i > 1 and let C 3 = {cft(C C\Vi) :C G C 2 }. Arguments like 
those above show that T m {Cz : Z) = T m (C2 : Y) with probability one and that 
dim(C 3 ) <dim(C 2 ). 

By Lemma 6, for each set C G C3, there is a set U(C) that is equal 
to a finite union of intervals and is such that \(CAU(C)) = 0. Let U = 
{U(C) : C G C 3 }. Then T m (U : Z) = T m (C 3 : Z) with probability one and it fol- 
lows from the other relations established above that lim sup m T m (JA : Z) > 
with positive probability. Fix L > 1. By Proposition 3, there exist sets 
U(Ci), . . . ,U(Cl) £hi such that their join has 2 L cells and each cell has 
positive probability. It follows that the join Ji = C\ V • • • V Cl is also full. 
As L was arbitrary, Lemma 1 implies that C3 has infinite VC dimension, 
and the same is therefore true of C. This completes the proof of Theorem 1. 

4. Proof of VC-major and VC-graph results. 

4.1. Proof of Proposition 1. Let X be a stationary ergodic process. Sup- 
pose, first, that J- is bounded, with constant envelope M < 00. Fix e > 
and select an integer K such that 2M/K < e. For each / G J-, define the 
approximation 

f(x) =M-^fj2 J (/( x ) — M ~ 2MJ/K). 
i=i 

Note that f(x) — e < f(x) < f(x) for each x G X and thus, by an elementary 
bound, 



0<r m (^:X)<2 £ + r m (JT:X) ) 
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where F = {/ : / G F}. It follows readily from Theorem 1 and the assumption 
that dimvc(-^") is finite that r m (J r :X) — s> with probability one as n tends 
to infinity. As e > was arbitrary, we conclude that T m (F:~X) — > with 
probability one as well. 

Now, suppose that F has an envelope F such that EF(X) < co. Fix 
< M < oo and for each / G F, define f M (x) = f{x)I(F(x) < M). Let 
Fm = {/m : / G -T 7 }- Then, by an elementary bound and an application of 
the ergodic theorem to F(x)I(F(x) < M), we have 

< limsupr m (7- : X) < limsupr m (J- M : X) + 2E[F(X)I(F(X) > M)}. 

m— >oo m—^oo 

A straightforward argument shows that Fm is a VC-major class and there- 
fore, by the result above, the first term on the right-hand side is equal to zero. 
The second term can be made arbitrarily small by choosing M sufficiently 
large. 

4.2. Proof of Proposition 2. Let X be a stationary ergodic process with 
one-dimensional marginal distribution /x. Let M < oo be an envelope for 
F. Replacing each / G F by (/ + M)/2M , we may assume without loss of 
generality that each / G F takes values in [0, 1] and, therefore, 

Gf = {(x, s) : x G X and < s < f{x) < 1}. 

Let Yi, Y2, . . . G [0, 1] be independent, uniformly distributed random variables 
defined on the same probability space as, and independent of, the process X. 
For i > 1, define Zi = (Xi,Yi) G X x [0, 1]. It follows from standard results 
in ergodic theory (cf. [14]) that the process Z = Zi^Zi, . . . is stationary and 
ergodic. Let Z = (X, Y) be distributed as Z\. By an application of Fubini's 
theorem, for each / G F, we have 



(18) 



P(ZeG / ) = (/*®A)(G / )= / X((G f ) x )dfi(x) 

Jx 

= / f(x)dfi(x) = Ef(X), 
Jx 

where G x = {s : (x, s) G G} denotes the x-section of G. Moreover, 

(19) - E m e G f ) = -jrm < fix,)). 

i=l i=l 

By an elementary bound, r m (.F:X) ^F^FiZ) +T^ n (F:Z), where 



rJn^^) = SUp 



1 rn 

-Y,m<f(x t ))-Ef(x) 



m 

i=l 
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and 



r^(j-:Z) 



sup 



1 m 

-J2im<nxi))-m)} 



i=l 



It follows from (18) and (19) that 



sup 

G£G 



_. m 

— y/(ZiGG)-P(ZGG) 

771 ' 



1=1 



which tends to zero with probability one by Theorem 1 and the assumption 
that Q is a VC class. To analyze the second supremum, note that when 
X\ = xi , . . . , X m = x m are fixed, 



sup 



1 m 

-J2[m<f{xi))-'P0r i <f(x i ))] 



i=l 



and that Y\ , . . . , Y n remain independent under this conditioning. By a routine 
modification of standard empirical process arguments like those in Theorem 
3.1 of Devroye and Lugosi [5], one may establish that 

A 



E[rf n (T,Z)\X?]<2 



ln2S m (G) 



m 



Here, S m {Q) is the (maximal) shatter coefficient of Q defined by 

S m (g)=ma X \{Gn{z 1 ,...,z m }:Geg}\, 

where the maximum is taken over all m-sequences zi, . . . ,z m G X X [0,1]. As 
Q has finite VC dimension, V say, it follows from Sauer's Lemma A above 
that S m (Q) < (m + l) v and, consequently, that L m = 0{(\u.m/m) 1 / 2 ). A 
straightforward application of McDiarmid's bounded difference inequality 
(cf. Theorem 2.2 of [5]) shows that for t > 0, 

P(r^(^: Z) > L m + t\X?) < e~ 2mt2 . 

Taking expectations, the same bound holds for the unconditional probability 
and it then follows from a simple application of the first Borel-Cantelli 
lemma that r 2 n (J r :Z) tends to zero with probability one as m tends to 
infinity. 



APPENDIX 

A.l. Proof of Lemma 5. Following arguments like those in Breiman [4], 
we may assume, without loss of generality, that X = {Xi : — oo < i < oo} is 
a two-sided process and that X is defined on a probability space (Q, J 7 , P) 
via a left shift transformation and a projection map. Specifically, is the 
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set of all bi-infinite sequences u = (uji)^_ QO , where ui G X for each i, and 
T = <S)£-cx3 S is the usual product sigma-field. We may further assume that 
Xi{oj) = Xq(T 1 uj), where Xq : £1 X is the coordinate projection Xq(lu) = ujq 
and T : $7 — > Q is the standard left-shift transformation defined by (Tuj)i = 
u)i—i. The stationarity of X implies that T and T" 1 preserve P(-). Ergodicity 
of X ensures that T is ergodic: if TA = A, then F(A) = or 1. 

As noted by Steele [21], the subadditive ergodic theorem implies that the 
random variables T m (C:X) converge with probability one to a constant. In 
particular, if limsup m T m (C : X) > with positive probability, then it follows 
that 

(20) liminf T m {C : X) > with probability one. 

m 

This stronger converse of the Glivenko-Cantelli property will be needed in 
what follows. 

Let A = {x E X : //({a;}) = 0} contain the nonatomic points of X. If A c = 
0, then X is uncountable and there is nothing else to prove. Assume, then, 
that A c 7^ 0. As A c consists of the (finite or countable) set of points in 
X having positive //-measure, it follows that A £ S. Given e > 0, we may 
express A c as a disjoint union A\ U A<i such that the cardinality of A\ is 
finite and li^A-i) < e. Let (x m {A) = mT 1 Y^L\ I{Xi € A) denote the empirical 
measure of Xi, . . . , X m . By an elementary bound, 

r m (c : x) < r m (c n a : x) + IMM) - m(M)I + + n{A 2 ). 

As m tends to infinity, the second term above tends to zero and the last two 
terms are together less than 2e. As e > was arbitrary, we conclude that 
fi(A) > so that X is uncountable. Moreover, (20) implies that liminf m T m (Cn 
A : X) > with probability one. 

Let Qa denote the set of uj € £1 such that ujq £ A and both index sets 
{i > 1 :wi € A} and {i < — 1 : Wi E A} are infinite. By the ergodic theorem, 
F(Q A ) = K A ) > °- For w e define t{uj) = mm{k > 1 : T k uj G A} (which 
is finite) and the induced transformation T : Qa by Tuj = T t ^uj. Rou- 

tine arguments from ergodic theory [14] show that T is invertible, is mea- 
surable on the restricted sigma-field J- a = J- nil. a, preserves the normalized 
measure Pa( - ) = P( - )/P(^a) on {SIa-iFa) and is ergodic. For the sake of 
completeness, we provide a sketch of the proofs using a geometric argument 
from ergodic theory known as the Kakutani skyscraper. For each positive in- 
teger k, define A k = {uj € Ha '■ r ( w ) = The sets Ai,A 2 , . . . then partition 
£Ia- Moreover, Ufc^=i Ufco T l A k is a disjoint union containing almost every 
point in £1. The Kakutani skyscraper of £Ia is created by stacking the sets 
T l A k , . . .,T k ~ l A k above A k for each fe > 1. 
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The measurability of T follows from the fact that each Ak is measur- 
able and that T restricted to Ak equals T k restricted to A^. Invertibil- 
ity of T follows directly from the invertibility of T and the construction 
of the Kakutani skyscraper. In particular, let uj\ ^ U2 be points in Ha- 
Then f(wi) = T(T t ^~ 1 uj 1 ) and f(uj 2 ) = T{T t ^~ 1 oj 2 )- As T is invert- 
ible, and T r( - UJl ^~ 1 (ui) and T T ( W2 ' •'•(wa) are distinct points in the Kakutani 
skyscraper, it follows that T(oj\) ^T{u)2). The measure-preserving property 
of T follows from the fact that T is measure preserving on each of the sets 
Ak- To establish ergodicity, suppose that B C O a is a set of positive measure 
that is invariant for T. The set C = US— 00 T % B is invariant under T, and 
C fl A = B since B is invariant for T. As T is ergodic, C contains A and so 
A = B. It follows that T is ergodic. 

Define Xq : Qa —> X by Xq{oj) = ojq and Xi{oj) = Xq{T 1 uj) for —00 < i < 
00. The process X = {X{} defined on (Qa^A^a) is then stationary and er- 
godic, takes values in (X, S) and has marginal distribution ha(-) = M')/ M-^) 
with no point masses. 

We wish to show that limsup m T m (C : X) > with positive P,4-probability. 
To this end, for each u £ Qa, define tq(u) = 0, t\{uj) = t(cj) and t/ + i(cj) = 
min{/c > Ti{uj) :uJk £ ^4}- By definition of £Ia, each function 77 is finite. For 
each m > 1, C € C and uj G &a, 

m—l 



- V i(x i {u)eC) = — Y i(Xj(u))€CnA) 



(21) 



i=0 



i=o 



r m _i(w) 



where we have defined W m = /j,(A)T m -i/m. By the ergodic theorem, for 
P^-almost every uj £ CI a, 



m 



r m _i(w) r m _i(w) 



T m -l(lS) 

— - J] Z(I»6Cnvl)->/i(yl) 



j=0 



as m tends to infinity and so W m — > 1 with P^ probability one. Omitting 
the dependence on to, it follows from (21) and the definition of //a(-) that 



T m (C:X) = sup 
1 



m— 1 



-V/(x 4 gC)- w (c) 

m ^ — * 

1 T m _l 

, /(x,- e c n A) - fi{c n A) 

T~m— 1 n 



SU P 



3=0 
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_. T m _l 



T~m—1 ■ n 



^ -TIT r — i (C D A :X) - |W m - 1| sup 

-^4) rrm - l(Cn ^ 4:X) " |Wm " 11 ' 

The first inequality above follows by writing W m = 1 + (W m — 1) and then 
using the elementary bound sup a |a a — b a \ > sup a |a a | — sup a |6 a |. It follows 
from the last display that 

lim sup T m (C : X) > lim inf T m (C : X) 

m m 

^^4) lin r fr — (cnA:X) 

> — r-v lim inf T m (C n >1 : X) 

and the argument above shows that the final term is positive with P^- 
probability one. This completes the proof. 

A. 2. Proof of Lemma 6. The isomorphism (f> is defined as a limit of 
isomorphisms <j) n . The maps <j) n are defined inductively. To begin, let 

j, M _/A([0,ar]nCi), ifxGd, 
<Pl[?) - <y A(Ci) + n if X € C{. 

Then (j>\ maps Ci into [0, A(Ci)] and C± into [A(Ci),l]. By standard argu- 
ments, 4>i is Lebesgue measure-preserving and a bijection almost everywhere. 

Suppose now that maps (f>i, . . . , <j) n have been defined in such a way that: 
(i) for each element A of the join J n = \/" =1 Ci and each x € A, (f> n (x) = 
Pn(A) + X([0,x] fli), where /3 n (A) is a constant; (ii) the intervals {[/3 n (A), 
/3 n (A) + A(A)):A € J n } form a disjoint covering of [0,1). For each each 
A 6 J n and each x £ A, define 

, J/3 n (i) + A([o,x]ninc n+ i), if x e A n cw+i, 

9n+i{x) - I + n + A([Q)X] nAn C e +i)) if x € j4 n C£ +1 . 

With these definitions, properties (i) and (ii) hold for J n +i and (j) n +i- More- 
over, <f>i, 02) • • • have the property that for each n, each cell A £ J n and each 
m>n, the function ^ m is a Lebesgue- measure-preserving almost everywhere 
bijection from A into [(3 n (A) , f3 n (A) + A(A)]. In particular, for each A € J n 
and each m>n, 

C \{4> n {A)) = cl{4> m {A)) = [f3 n (A),/3 n (A) + X(A)], 



where cl(U) denotes the closure of U. 
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Fix x £ [0, 1] for the moment and, for n > 1, let A n {x) be the cell of J n con- 
taining x. Note that the sequence (j> n (x),(j> n +i(x), ... is contained in the inter- 
val cl((j) n (A n (x))) , whose diameter is equal to X(A n (x)) < diam(A n (x)) . By 
assumption, the latter quantity tends to zero as n — > oo and so {(j> n (x) : n > 
1} is a Cauchy sequence. Let (j>{x) denote its limit. Then <p(-) is a limit of 
measurable functions, hence measurable. 

We claim that c\(4>(A)) = cl(4> n (A)) for every n > 1 and every A £ J n . To 
see this, fix A £ J n . If y G cl(</>(^4)), then there exist x±,X2, ■ ■ ■ £ A such that 
<t>(zm) ~^ V- By definition of 0(-), there exist integers ri,r2, . . . tending to in- 
finity such that <pr m (x m ) ->• y. As each value <f>r m (xm) G 0r ro (^) £ cl(0 n (A)), 
we have y G cl((/> n (A)). Thus, c\(4>(A)) C cl(0 n (>l)), the latter set being equal 
to the interval I a = [Pn(A) , f3 n (A) + X(A)]. To establish the converse, let 
yo G and e > be such that (yo — e, yo + e) C J^. By the shrinking diame- 
ter assumption on the joins J m , there exists an integer m and a cell A' G J m 
such that A' C ^4 and cl(<^ m (^4')) C J has positive measure. Thus, if x £ A', 
then r (x) G cl(<^ m (^4')) for r > m and, therefore, 0(x) G /o- As e > was 
arbitrary, it follows that l° A C cl(<^(A)) and, consequently, I a Q c\((j)(A)) as 
well. 

We now establish that the map <fi preserves Lebesgue measure. To this 
end, for each n > 1, we define 

Q n = {cl((f)(A)) :A G J n } U {{f3 n (A)} : A G J n } U {{1}} 

to be the collection of intervals into which the elements of J n are mapped 
and the endpoints of these intervals. We wish to show that X(<fi~ 1 B) = X(B) 
for each B G Q n . First, suppose that a is the endpoint of some interval 
cl(4>(A')) with A' G J n . Fix e > and let m >n be large enough so that 
m&x{X(A):A G J m } < e/2. Let ^4i,...,A r be those elements of J m such 
that cl((/> m (A,)) contains the point a. Then cj)~ 1 {a} C [J^ =1 Aj and at most 
two of the sets Aj can have positive measure. It follows that A(</> -1 {a;}) < e, 
and, as e > was arbitrary, we have A((/> _1 {a}) = 0. Now, suppose that 
B G Q n is of the form B = cl((j)(A)) = [01,02] for some element A G J n . 
Then (p~ 1 B = A U </>~ 1 {ai} U 0~ 1 {a2} an d therefore 

X^B) = X(A) = X(cl(4> n (A))) = X(cl(4>(A))) = X(B). 

It follows from these arguments that X{(f)~ l B) = X{B) for each B G Um>i Q-m- 
As the latter collection generates the Borel sigma- field of [0, 1] and is closed 
under intersections, (ft preserves Lebesgue measure. 

Next, we show that <fi is one-to-one on a Borel subset of [0, 1] with full 
measure. Let Q° = \J™ =1 {/3 m (A) : A G J m } U {{1}} be the (countable) set 
of endpoints of the intervals {cl(4>(A)) :A G J m ,m > 1}. Since (p~ x preserves 
Lebesgue measure, A((/>~ 1 Q ) = 0. Define Vi = [0, 1] X^Q , so that A(Vi) = 
1. Let x\ and X2 be distinct points in V\. Since the diameters of the elements 
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of J n tend to zero, there exists an n such that x\ and X2 are contained in 
different elements of J n . Thus, 4> n maps x\ and X2 to distinct intervals, 
which may intersect only at their endpoints. Hence, <j) also maps x\ and 
X2 to distinct intervals. Since V\ excludes points that map to endpoints of 
these intervals, 4>{xi) ^ 4>(x2)- Therefore, is a bijection on V± and we have 
established conclusion (i) of the lemma. 

Conclusion (ii) of the lemma follows from (i) and general results concern- 
ing measurable maps of complete separable metric spaces; see Corollary 3.3 
of Parthasarathy [12]. To establish (hi), note that for any measurable subset 
AC.V±, \(4>(A)) = A(0 —1 (0(^4))) = X(A) since 4> is measure-preserving and 
one-to-one on V±. 

To establish conclusion (iv), let C EC. There then exist positive integers k 
and n such that C = Ui=i where A\,A2, ■ ■ ■ , A), are (disjoint) cells in J n . 
Let U(C) = \J k i=1 [p n (A i ),Pn(Ai) + HA i )}. Then (j)(C) = U?=i 4>( A i) Q UiU 
cl(^(Ai))= U{C) and \{4>(C)) = £*UA(A) = X (U(C)). Thus, A(0(C) A 
U(C)) = X(U(C)\<f>(C)) = 0. 

Remark. The condition that the cells of the joins have diminishing 
diameters, rather than measures tending to zero, is necessary. If, for exam- 
ple, C n = Ui=o 1 ['2^tt^ fsrr) for positive integers n, then the limiting map is 
4>{x) = 2xmodl. 
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